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ABSTRACT. Phase retrieval seeks to recover a signal x 6 C p from the amplitude \Ax\ of linear measure- 
ments Ax £ C n . We cast the phase retrieval problem as a non-convex quadratic program over a complex phase 
vector and formulate a tractable relaxation (called PhaseCut) similar to the classical MaxCut semidefinite pro- 
gram. We solve this problem using a provably convergent block coordinate descent algorithm whose structure 
is similar to that of the original greedy algorithm in Gerchberg and Saxton [1972], where each iteration is a 
matrix vector product. Numerical results show the performance of this approach over three different phase 
retrieval problems, in comparison with greedy phase retrieval algorithms and matrix completion formulations. 



1. Introduction 

The phase recovery problem, i.e. the problem of reconstructing a complex phase vector given only the 
magnitude of linear measurements, appears in a wide range of engineering and physical applications. It 
is needed for example in X-ray and crystallography imaging [Harrison, 1993], diffraction imaging [Bunk 
et al., 2007] or microscopy [Miao et al., 2008]. In these applications, the detectors cannot measure the phase 
of the incoming wave and only record its amplitude. Recovering the complex phase of wavelet transforms 
from their amplitude also has applications in audio signal processing [Griffin and Lim, 1984]. 

In all these problems, complex measurements of a signal x G C p are obtained from a linear injective 
operator A, but we can only measure the magnitude vector \Ax\. Depending on the properties of A, the 
phase of Ax may or may not be uniquely characterized by the magnitude vector \Ax\, up to an additive 
constant, and it may or may not be stable. For example, if A is a one-dimensional Fourier transform, then 
the recovery is not unique but it becomes unique for an oversampled two-dimensional Fourier transform, 
although it is not stable. Uniqueness is also obtained with an oversampled wavelet transform operator A, 
and the recovery of x from \Ax\ is then continuous [Waldspurger and Mallat, 2012]. If x is multiplied by 
several random filters before computing its Fourier transform then uniqueness can be proved with weak 
stability results [Candes et al., 201 lb]. 

Recovering the phase of Ax from \Ax\ is a nonconvex optimization problem. Until recently, this prob- 
lem was solved using various greedy algorithms [Gerchberg and Saxton, 1972; Fienup, 1982; Griffin and 
Lim, 1984], which alternate projections on the range of A and on the nonconvex set of vectors y such that 
\y\ = \Ax\. However, these algorithms often stall in local minima. A convex relaxation called PhaseLift 
was introduced in [Chai et al., 2011] and [Candes et al., 2011b] by observing that \ Ax\ 2 is a linear function 
of X = xx* which is a rank one Hermitian matrix. The recovery of x is thus expressed as a rank minimiza- 
tion problem over positive semidefinite Hermitian matrices X satisfying some linear conditions. This last 
problem is approximated by a semidefinite program which has been shown to recover x for several classes 
of linear operators A [Candes et al., 2011b,a]. 

Our main contribution here is to formulate phase recovery as a quadratic optimization problem over 
the unit complex torus. We then write a convex relaxation to phase recovery very similar to the MaxCut 
semidefinite program (we call this relaxation PhaseCut in what follows). While the resulting SDP is typically 
larger than the PhaseLift relaxation, its simple structure (the constraint matrices are singletons) allows us to 
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solve it very efficiently. In particular, this allows us to use a provably convergent block coordinate descent 
algorithm whose structure is similar to that of the original greedy algorithm in Gerchberg and Saxton [1972] 
(each iteration is a matrix vector product, which can be computed efficiently). Furthermore, we show, under 
the condition that A is injective and b is not noisy, an equivalence result between PhaseLift and a modified 
version of PhaseCut. This result implies that both algorithms are simultaneously tight. In a noisy setting, 
one can show that PhaseCut is at least as stable as a variant of PhaseLift, while PhaseCut empirically appears 
to be more stable in some cases, e.g. when b is sparse. 

Seeing the MaxCut relaxation emerge in a phase recovery problem is not entirely surprising: it appears, 
for example, in an angular synchronisation problem where one seeks to reconstruct a sequence of angles 
9-i (up to a global phase), given information on pairwise differences Q\ — 6j mod. 2n, for G S [see 
Singer, 201 1], the key difference between this last problem and the phase recovery problem in ( 1) is that the 
sign information is lost in the input to (1). Complex MaxCut relaxations of decoding problems also appear 
in maximum-likelihood channel detection [Luo et al., 2003; Kisialiou and Luo, 2010; So, 2010]. From 
a combinatorial optimization perspective, showing the equivalence between phase recovery and MaxCut 
allows us to expose a new class of nontrivial problem instances where the semidefmite relaxation for a 
MaxCut-like problem is tight, together with explicit conditions for tightness directly imported from the 
matrix completion formulation of these problems. 

The paper is organized as follows. Section 2 explains how to factorize away the magnitude information 
to form a nonconvex quadratic program on the phase vector u G C" satisfying \ui\ = 1 for i = 1, . . . , n, 
and a greedy algorithm is derived in Section 2.3. We then derive a tractable relaxation of the phase recovery 
problem, written as a semidefmite program similar to the classical MaxCut relaxation in [Goemans and 
Williamson, 1995], and detail several algorithms for solving this problem in Section 3. Section 4 proves 
that a variant of PhaseCut and PhaseLift are equivalent in the noiseless case and thus simultaneously tight. 
We also prove that PhaseCut is as stable as a weak version of PhaseLift and discuss the relative complexity 
of both algorithms. Finally, Section 5 performs a numerical comparison between the greedy, PhaseLift and 
PhaseCut phase recovery algorithms for three phase recovery problems, in the noisy and noiseless case. In 
the noisy case, these results suggest that if b is sparse, then PhaseCut may be more stable than PhaseLift. 

Notations. We write S p (resp. H p ) the cone of symmetric (resp. Hermitian) matrices of dimension p ; 
(resp. Hj") denotes the set of positive symmetric (resp. Hermitian) matrices. We write || • || p the Schatten 
p-norm of a matrix, that is the p-norm of the vector of its eigenvalues (in particular, || • ||oo is the spectral 
norm). We write A' the (Moore-Penrose) pseudoinverse of a matrix A and WA]]^ the sum of the modulus of 
the coefficients of A. For x G W, we write diag(x) the matrix with diagonal x. When X G H p however, 
diag(X) is the vector containing the diagonal elements of X. For X G H p , X* is the Hermitian transpose 
of X, with X* = (X) T . Finally, we write b 2 the vector with components bf, i = 1, . . . , n. 



2. Phase recovery 

The phase recovery problem seeks to retrieve a signal x G C p from the amplitude b 
measurements, solving 

find x 
such that \Ax\ = b, 

in the variable x G C p , where A G C nxp and b G W 1 . 

2.1. Greedy Optimization in the Signal. Approximate solutions x of the recovery problem in (1) are 
usually computed from b = \Ax\ using algorithms inspired from the alternating projection method in [Ger- 
chberg and Saxton, 1972]. These algorithms compute iterates y k in the set F of vectors y G C n such that 
\y\ = b = \ Ax\, which are getting progressively closer to the image of A. The Gerchberg-Saxton algorithm 
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\Ax\ of n linear 
(1) 



projects the current iterate y k on the image of A using the orthogonal projector AA^ and adjusts to 6j the 
amplitude of each coordinate. We describe this method explicitly below. 



Algorithm 1 Gerchberg-Saxton. 



Input: An initial y 1 € F, i.e. such that |?/| = b. 
l: forfc = l,...,iV-ldo 
2: Set 



Vi +1 = h ||^tyfc)*| ' i = h---,n. (Gerchberg-Saxton) 



3: end for 
Output: yjy £F. 



Because F is not convex however, this alternating projection method usually converges to a stationary 
point y°° which does not belong to the intersection of F with the image of A, and hence l^^y 00 ) ^ b. 
Several modifications proposed in [Fienup, 1982] improve the convergence rate but do not eliminate the 
existence of multiple stationary points. To guarantee convergence to a unique solution, which hopefully be- 
longs to the intersection of F and the image of A, this non-convex optimization problem has recently been 
relaxed as a semidefinite program [Chai et al., 2011; Candes et al., 201 lb], where phase recovery is formu- 
lated as a matrix completion problem (described in Section 4). Although the computational complexity of 
this relaxation is much higher than that of the Gerchberg-Saxton algorithm, it is able to recover x from \Ax\ 
(up to a multiplicative constant) in a number of cases [Chai et al., 2011; Candes et al., 2011b]. 

2.2. Splitting Phase and Amplitude Variables. As opposed to these strategies, we solve the phase recov- 
ery problem by explicitly separating the amplitude and phase variables, and by only optimizing the values 
of the phase variables. In the noiseless case, we can write Ax = diag(6)-u where u £ C n is a phase vector, 
satisfying |uj| = 1 for i = 1, . . . , n. Given b = \Ax\, the phase recovery problem can thus be written as 

min \\Ax — diagf&Wllo, 

ueC n , K|=i, 
xecp 

where we optimize over both variables u G C n and x € C p . In this format, the inner minimization problem 
in x is a standard least squares and can be solved explicitly by setting 

x = A^ diag(6)u, 

which means that problem ( 1 ) is equivalent to the reduced problem 

min \\AA^ diag(6)it — diag(6)u||2- 

|«-;| =1 

The objective of this last problem can be rewritten as follows 

||^ t diag(6)-u-diag(6)w||| = \\{AA ] - I) diag(6)«||| 

= u* diag(6 T )M diag(6)u. 

where M = (AA^ - T)*(AA^ - I) = I - AAl Finally, the phase recovery problem ( 1 ) becomes 

minimize u*Mu 

subject to \ui\ = 1, i = 1, . . . n, 

in the variable u £ C n , where the Hermitian matrix 

M = diag(6)(I - AA^) diag(6) 
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is positive semidefinite. The intuition behind this last formulation is simple, (I — AJV) is the orthogonal 
projector on the orthogonal complement of the image of A (the kernel of A*), so this last problem simply 
minimizes in the phase vector u the norm of the component of diag(6)u which is not in the image of A. 



2.3. Greedy Optimization in Phase. Having transformed the phase recovery problem (1) in the quadratic 
minimization problem (2), suppose that we are given an initial vector u E C" , and focus on optimizing over 
a single component Ui for i = 1, . . . , n. The problem is equivalent to solving 



in the variable U{ G 
amounts to solving 



which means 



minimize UiMuUi + 2 Re UjMjiU, 
subject to \v,i\ = 1, i = 1, . . . n, 



where all the other phase coefficients uj remain constant. Because \m\ = 1 this then 



min Re 

Kl=i 




Ej^i M jiUj 



(3) 



for each i = 1, . . . ,n, when u is the optimum solution to problem (2). We can use this fact to derive 
Algorithm 2, a greedy algorithm for optimizing the phase problem. 



Algorithm 2 Greedy algorithm in phase. 

Input: An initial u G C n such that \m\ = 1, i = 1, . . . , n. An integer N > 1. 
1: for k = l,...,iVdo 
2: for i = 1, . . . n do 
3: Set 

m = — — 

4: end for 
5: end for 

Output: u 6 C n such that |uj| = 1, i = 1, . . . , n. 



This greedy algorithm converges to a stationary point u°°, but it is generally not a global solution of 
problem (2), and hence \AA^ diag(n°°)6| / b. It has often nearly the same stationary points as the Ger- 
chberg-Saxton algorithm. One can indeed verify that if u°° is a stationary point then y°° = diag(-u°°)6 is 
a stationary point of the Gerchberg-Saxton algorithm. Conversely if b has no zero coordinate and y°° is a 
stable stationary point of the Gerchberg-Saxton algorithm then n?° = yf° /\yf°\ defines a stationary point of 
the greedy algorithm in phase. 

If Ax can be computed with a fast algorithm using 0(n log n) operations, which is the case for Fourier or 
wavelets transform operators for example, then each Gerchberg-Saxton iteration is computed with 0(n log n) 
operations. The greedy phase algorithm above does not take advantage of this fast algorithm and requires 
0(n 2 ) operations to update all coordinates Uj for each iteration k. However, we will see in Section 3.6 that 
a small modification of the algorithm allows for 0{n log n) iteration complexity. 
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2.4. Complex MaxCut. Following the classical relaxation argument in [Shor, 1987; Lovasz and Schrijver, 
1991; Goemans and Williamson, 1995; Nesterov, 1998], we first write U = uu* G H n . Problem (2), written 

QP{M) = min. u*Mu 

subject to \u{\ = 1, i = 1, . . . n, 

in the variable u G C n , is equivalent to 

min. Tr(UM) 
subject to diag([7) = 1 

U h 0, Rank(C7) = 1, 

in the variable £7 G H n . After dropping the (nonconvex) rank constraint, we obtain the following convex 
relaxation 

SDP(M)± min. Tr(UM) (PhaseCut) 

subject to diag([7) = 1, U h 0, ^nasecut; 

which is a semidefinite program (SDP) in the matrix U G H n and can be solved efficiently. When the 
solution of problem PhaseCut has rank one, the relaxation is tight and the vector u such that U = uu* is an 
optimal solution of the phase recovery problem (2). If the solution has rank larger than one, a normalized 
leading eigenvector v of U is used as an approximate solution, and diag(C7 — vv T ) gives a measure of the 
uncertainty around the coefficients of v. 

In practice, semidefinite programming solvers are rarely designed to directly handle problems written 
over Hermitian matrices and start by reformulating complex programs in H n as real semidefinite programs 
over S2n based on the simple facts that follow. For Z, Y G H n , we define T(Z) G S2n as in [Goemans and 
Williamson, 2001] 

™-(%% 

so that Tr(T{Z)T(Y)) = 2 Tr(ZY). By construction, Z G H n iff T(Z) G S 2n - One can also check that 
z = x + iy is an eigenvector of Z with eigenvalue A if and only if 

x \ j ( —y 

1 and 1 



y j v x 

are eigenvectors of T{Z), both with eigenvalue A (depending on the normalization of z, one corresponds 
to (Re(z), Im(z)), the other one to (Re(i z), Im(i z)). This means in particular that Z y if and only if 

T(Z) y o. 

We can use these facts to formulate an equivalent semidefinite program over real symmetric matrices, 
written 

minimize Tr(T(M)X) 
subject to X i;i + X n+ i t n +i = 2 

^i)3 — Xn+i,n+j > -^-n+i,j — -^i,n+ji ^ij — 1, ... ,72, 

X h o, 

in the variable X in S2 n - This last problem is equivalent to PhaseCut. In fact, because of symmetries in 
T(M), the equality constraints enforcing symmetry can be dropped, and this problem is equivalent to a 
MaxCut like problem in dimension 2n, which reads 

minimize Tr(T(M)X) 

subject to diag(X) = 1 (5) 

X t 0, 

in the variable X in S2 n - As we will see below, formulating a relaxation to the phase recovery problem as a 
complex MaxCut-like. semidefinite program has direct computational benefits. 
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3. Algorithms 



In the previous section, we have approximated the phase recovery problem (2) by a convex relaxation, 
written 

minimize Tr(UM) 
subject to diag(CZ) = 1, U y 0, 
which is a semidefinite program in the matrix U € H n . The dual, written 

max nA m ; n (M + diag(u>)) — l T w, (6) 

is a minimum eigenvalue maximization problem in the variable w G W l . Both primal and dual can be solved 
efficiently. When exact phase recovery is possible, the optimum value of the primal problem PhaseCut is 
zero and we must have A m ; n (M) = 0, which means that w = is an optimal solution of the dual. 

3.1. Interior Point Methods. For small scale problems, with n ~ 10 2 , generic interior point solvers such 
as SDPT3 [Toh et al., 1999] solve problem (5) with a complexity typically growing as O (n 4 5 log(l/e)) 
where e > is the target precision [Ben-Tal and Nemirovski, 2001, §4.6.3]. Exploiting the fact that the 
2n equality constraints on the diagonal in (5) are singletons, Helmberg et al. [1996] derive an interior point 
method for solving the MaxCut problem, with complexity growing as 

0(n 3 - 5 log(l/e)) 

where the most expensive operation at each iteration is the inversion of a positive definite matrix, which 
costs 0(n 3 ) flops. 

3.2. First-Order Methods. When n becomes large, the cost of running even one iteration of an interior 
point solver rapidly becomes prohibitive. However, we can exploit the fact that the dual of problem (5) 
can be written (after switching signs) as a maximum eigenvalue minimization problem. Smooth first-order 
minimization algorithms detailed in [Nesterov, 2007] then produce an e-solution after 
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floating point operations. Each iteration requires forming a matrix exponential, which costs 0(n 3 ) flops. 
This is not strictly smaller than the iteration complexity of specialized interior point algorithms, but matrix 
structure often allows significant speedup in this step. Finally, the simplest subgradient methods produce an 
e-solution in 

/n 2 logn N 
V e 2 

floating point operations. Each iteration requires computing a leading eigenvector which has complexity 
roughly 0(n 2 log n). 

3.3. Block Coordinate Descent. We can also solve the semidefinite program in PhaseCut using a block 
coordinate descent algorithm. While no explicit complexity bounds are available for this method in our 
case, the algorithm is particularly simple and has a very low cost per iteration (it only requires computing 
a matrix vector product). We write i c the index set {1, . . . , i — 1, i + 1, . . . , n} and describe the method as 
Algorithm 3. 

Block coordinate descent is widely used to solve statistical problems where the objective is separable 
(LASSO is a typical example) and was shown to efficiently solve semidefinite programs arising in covariance 
estimation [d'Aspremont et al., 2006]. These results were extended by [Wen et al., 2009] to a broader class 
of semidefinite programs, including MaxCut. We briefly recall its simple construction below, applied to a 
barrier version of the MaxCut relaxation PhaseCut, written 

minimize Tr(UM) - /^logdet(77) 
subject to diag([7) = 1 



which is a semidefinite program in the matrix U G H n , where p, > is the barrier parameter. As in interior 
point algorithms, the barrier enforces positive semidefiniteness and the value of \i > precisely controls the 
distance between the optimal solution to (7) and the optimal set of PhaseCut. We refer the reader to [Boyd 
and Vandenberghe, 2004] for further details. The key to applying coordinate descent methods to problems 
penalized by the logdet(-) barrier is the following block-determinant formula 

det(C7) = det(S) det(y - x T B~ 1 x), when U = ( ^ T * Y U y 0. (8) 

This means that, all other parameters being fixed, minimizing the function det(X) in the row and column 
block of variables x, is equivalent to minimizing the quadratic form y — x T Z~ 1 x, arguably a much simpler 
problem. Solving the semidefinite program (7) row/column by row/column thus amounts to solving the 
simple problem (9) described in the following lemma. 

Lemma 3.1. Suppose a > 0, c G R™ -1 , and B G S n _i are such that b ^ and B )- 0, then the optimal 
solution of the block problem 

min c T x — <jlog(l — x T B~ x x) (9) 

X 

is given by 



V a2 + 7 - o" o 
x = — Be 

7 

where 7 = c 7 Be. 

Proof. As in [Wen et al., 2009], a direct consequence of the first order optimality conditions for (9). ■ 

Here, we see problem (7) as an unconstrained minimization problem over the off-diagonal coefficients 
of U, and (8) shows that each block iteration amounts to solving a minimization subproblem of the form (9). 
Lemma 3. 1 then shows that this is equivalent to computing a matrix vector product. Linear convergence 
of the algorithm is guaranteed by the result in [Boyd and Vandenberghe, 2004, §9.4.3] and the fact that 
the function log det is strongly convex over compact subsets of the positive semidefinite cone. So the 
complexity of the method is bounded by 

O flog - 



e 

but the constant in this bound depends on n here, and the dependence cannot be quantified explicitly. 

Algorithm 3 Block Coordinate Descent Algorithm for PhaseCut. 

Input: An initial X° = I n and v > (typically small). An integer N > 1. 



for k = l,...,JVdo 
Picki G [l,n]. 
Compute 



X% icMic^ and 7 = x*M;< 



4: If 7 > 0, set 



else 



xl 1+ 1 = x?i u = 0. 



5: end for 

Output: A matrix X >z with diag(X) = 1. 
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3.4. Initialization & Randomization. Suppose the Hermitian matrix U solves the semidefinite relax- 
ation PhaseCut. As in [Goemans and Williamson, 2001; Ben-Tal et al., 2003; Zhang and Huang, 2006; 
So et al., 2007], we generate complex Gaussian vectors xeC" with x ~ Wc(0, U), and for each sample x, 
we form z G C n such that 

Zi = - — - 1 = 1, ...,n. 

| 

All the sample points z generated using this procedure satisfy \z%\ = 1, hence are feasible points for prob- 
lem (2). This means in particular that QP(M) < E[z*Mz]. In fact, this expectation can be computed 
almost explicitly, using 

E[zz*] = F(U), with F(w) = \e isx ^ / cos(fl) arcsin(H cos(9))d9 

Jo 

where F(U) is the matrix with coefficients F(Uij), i,j = l,...,n. We then get 

SDP(M) < QP(M) < Tr(MF(U)) (10) 

In practice, to extract good candidate solutions from the solution U to the SDP relaxation in PhaseCut, we 
sample a few points from A/"c(0, U), normalize their coordinates and simply pick the point which mini- 
mizes z*Mz. 

This sampling procedure also suggests a simple spectral technique for computing rough solutions to prob- 
lem PhaseCut: compute an eigenvector of M corresponding to its lowest eigenvalue and simply normalize 
its coordinates (this corresponds to the simple bound on MaxCut by [Delorme and Poljak, 1993]). The infor- 
mation contained in U can also be used to solve a robust formulation [Ben-Tal et al., 2009] of problem (1) 
given a Gaussian model u ~ A/c(0, U). 

3.5. Approximation Bounds. The semidefinite program in PhaseCut is a MaxCut-type graph partitioning 
relaxation whose performance has been studied extensively. Note however, that most approximation results 
for MaxCut study maximization problems over positive semidefinite or nonnegative matrices, while we are 
minimizing in PhaseCut so we do not inherit the constant approximation ratios that hold in the MaxCut 
setting. The approximation results in [Goemans and Williamson, 2001; Ben-Tal, Nemirovski, and Roos, 
2003; Zhang and Huang, 2006; So, Zhang, and Ye, 2007] use the randomization argument above to show 
that the optimum SDP(W) of PhaseCut approximates that of problem (1) with an approximation ratio of 
7r/4 when the objective matrix W S H n is negative semidefinite (all the results cited above are focused on 
maximization problems, hence the signs are switched), i.e. 

SDP(-W) < QP(-W) < jSDP(-W). 

A similar bound (in tt/2) holds in the binary case where u 6 1™, and this last bound cannot be improved, 
as shown in [Alon and Naor, 2004]. If we only assume that diag(VF) = 0, then the references above also 
show that the optimal values of problems (2) and PhaseCut satisfy QP(W) < and 

SDPiW) < QP(W) < -^—SDP{W), 

log n 

where c > is an absolute constant, [Ben-Tal et al., 2003] show that this bound too is unimprovable without 
further assumptions on the structure of A. In our case, setting W = M — A max (M)I means that 

SDP(M) < QP(M) < ^SDP(M) + (l - ~) nA max (M). 

This produces a bound on the quality of the approximation of the phase recovery problem (2) by the semi- 
definite relaxation PhaseCut. More importantly, we will see in the next section that we can also obtain 
explicit conditions for this relaxation to be exact. 



3.6. Exploiting Structure. In some instances, we have additional structural information on the solution of 
problems (1) and (2), which usually reduces the complexity of approximating PhaseCut and improves the 
quality of the approximate solutions. We briefly highlight a few examples below. 

3.6.1. Symmetries. In some cases, e.g. signal processing examples where the signal is symmetric, the 
optimal solution re has a known symmetry pattern. For example, we might have u{k- — i) = u{k + + i) 
for some k + and indices i £ [0, k^ — 1]. This means that the solution u to problem (1) can be written 
u = Pv, where DeC with q < n, and we can solve (1) by focusing on the smaller problem 

minimize v*P*MPv 

subject to \(Pv)i\ = I, i = l,...n, 

in the variable v £ C q . We reconstruct a solution re to (1) from a solution v to the above problem as re = Pv. 
This produces significant computational savings. 

3.6.2. Alignment. In other instances, we might have prior knowledge that the phases of certain samples are 
aligned, i.e. that there is an index set / such that 

ui = Uj, for all i, j £ /, 

this reduces to the symmetric case discussed above when the phase is arbitrary. W.l.o.g., we can also fix the 
phase to be one, with Uj = 1 for i £ /, and solve a constrained version of the relaxation PhaseCut 

min. Tr(UM) 

subject to Uij = 1, i,j £ J, 

diag(J7) = l,UhQ, 

which is a semidefinite program in U £ H n . 

3.6.3. Fast Fourier transform. If the product Mx can be computed with a fast algorithm in 0(n log re) 
operations, which is the case for Fourier or wavelet transform operators, we significantly speed up the 
iterations of Algorithm 3 to update all coefficients at once. Each iteration of the modified Algorithm 3 then 
has cost 0{n\ogn) instead of 0(n 2 ). 

3.6.4. Real valued signal. In some cases, we know that the solution vector x in (1) is real valued. Prob- 
lem (1) can be reformulated to explicitly constrain the solution to be real, by writing it 

min \\Ax — diaef&Wlln 

ueC",K|=i, 

or again, using the operator T(-) defined in (4) 



minimize 



T(A)[ J -dia g (l) (£*;> 



subject to re £ C n , = 1 

x £ WP. 

The optimal solution of the inner minimization problem in x is given by x = where 

^ 2= (fm(l))' ^ = diag(^), and v = ( £g 

hence the problem is finally rewritten 

minimize 1 1 (A2 A^B2 — -62)^ || | 
subject to vf + v^ +i = 1, i = l,...,n, 
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in the variable v G R . This can be relaxed as above by the following problem 

minimize Tr(VM 2 ) 

subject to Vi + V% +i>n+i = 1, i = 1, . . . , n, 

which is a semidefinite program in the variable V G S2n, where M2 = {A2A\B2 — B2) 1 \A2A\B2 — B2) = 
BT{l-A 2 A\)B 2 . 



4. Matrix completion & exact recovery conditions 

In [Chai et al., 2011; Candes et al., 2011b], phase recovery (1) is cast as a matrix completion problem. 
We briefly review this approach and compare it with the semidefinite program in PhaseCut. Given a signal 
vector b G R n and a sampling matrix A G C nxp , we look for a vector x G C p satisfying 

\a*x\ = bi, i = 1, . . . , n, 

where the vector a* is the i th row of A and x G C p is the signal we are trying to reconstruct. The phase 
recovery problem is then written as 

minimize Rank(X) 
subject to Tr(aia*X) = bf, i = l,...,n 
X ± 

in the variable X G H p , where X = xx* when exact recovery occurs. This last problem can be relaxed as 

minimize Tr(X) 

subject to Tr(aia*X) = bf, i = 1,... ,n (PhaseLift) 
X y 

which is a semidefinite program (called PhaseLift by Candes et al. [2011b]) in the variable X G Hp. Recent 
results in [Recht et al., 2010; Candes and Tao, 2010] give explicit (if somewhat stringent) conditions on A 
and x under which the relaxation is tight (i.e. the optimal X in PhaseLift is unique, has rank one, with 
leading eigenvector x). 

4.1. Weak Formulation. We also introduce a weak version of PhaseLift, which is more directly related 
to PhaseCut and is easier to interpret geometrically. It was noted in [Candes et al., 2011a] that, when 
I G Vectjaja*}, the condition Tr(aia*X) = bf,i = 1, ...,n determines Tr(X), so in this case the trace 
minimization objective is redundant and PhaseLift is equivalent to 

find X 

subject to Tr (aia*X) =bf, i = l,...,n (Weak PhaseLift) 

X y 0. 

When I ^ Vect{ aia*} on the other hand, Weak PhaseLift and PhaseLift are not equivalent: solutions of 
PhaseLift solve Weak PhaseLift too but the converse is not true. Interior point solvers typically pick a 
solution at the analytic center of the feasible set of Weak PhaseLift which in general can be significantly 
different from the minimum trace solution. 

However, in practice, the removal of trace minimization does not really seem to alter the performances of 
the algorithm. We will illustrate this affirmation with numerical experiments in §5.4 and a formal proof is 
given in [Demanet and Hand, 2012]. Moreover, these authors showed that, in the case of Gaussian random 
measurements, the relaxation of Weak PhaseLift was tight with high probability under the same conditions 
as PhaseLift. 
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4.2. Phase Recovery as a Projection. We will see in what follows that phase recovery can interpreted as a 
projection problem. Indeed, the PhaseCut reconstruction problem defined in PhaseCut is written 

minimize Tr(UM) 

subject to diag(?7) = 1, U h 0, 

with M = diag(6)(I — AJv) diag(fe). In what follows, we assume hi ^ 0, i = 1, n, which means that, 
after scaling U, solving PhaseCut is equivalent to solving 

minimize Tr(V(I - AA^)) 

subject to diag(y) = b 2 (11) 
V h 0. 

In the following lemma, we show that this last semidefinite program can be understood as a projection 
problem on a section of the semidefinite cone using the trace (or nuclear) norm. We define 

F = {V G H n : x*Vx = 0, Vx G K(A)*-} 

which is also 

J = {^eH n :(I - Atf)V(I - AA^) = 0}, 
and we now formulate the objective of problem (11) as a distance. 

Lemma 4.1. For all 7sH„ such that V >z 0, 

Tr(V(I-AA^) = dx(V,T) (12) 
where d\ is the distance associated to the trace norm. 

Proof. Let B\ (resp. B2) be an orthonormal basis of range A (resp. (range A) L ). Let T be the transfor- 
mation matrix from canonical basis to orthonormal basis B\ U £>2- Then 

f = {v g H n s.t. rVr = ( si S o ),Si€ H p , S 2 G M p , n - P } 



As the transformation X — y T 1 XT preserves the nuclear norm, for every matrix V >z 0, if we write 

T~ l VT : 

then the orthogonal projection of V onto F is 



Vl v 2 

v 3 



w = t(Ya v A t- 1 



v 2 * 

sodi(V,^ = \\V-Wh = || (^ 3 ) AsV t 0, (v*vl) t ohence (ov 8 ) ^ °. sodi(F,J-) = 
Tr ( § y 3 ) - Because AA^ is the orthogonal projection onto TZ(A), we have 

T~ 1 (I-AA^)T = (°°) 

hence 

di(V,J") = Tr([j^) =Tr((T- 1 yT)(T- 1 (I- AAt)T)) = Tr(F(I - AA*)) 
which is the desired result. ■ 

This means that PhaseCut can be written as a projection problem, i.e. 

minimize d\(V, F) 

subject to veu+nn b { ' 

in the variable V G H n , where Hi, = {V G H n s.t. Vii = bf,i = 1, n}. Moreover, with 04 the i-th row 
of A, we have for all X G H+ 

Tr(aja*X) = a*Xaj = diag(AX^4*)j, i = 1, . . . , n, 
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Figure 1. Schematic representation of the sets involved in equations (13) and (14) : the 
cone of positive hermitian matrices H+ (in light grey), its intersection with the affine sub- 
space Hi,, and F n H+, which is a face of H+. 

so if we call V = AX A* £ F, when A is injective, X = A^VA^* and Weak PhaseLift is equivalent to 

find V 6 H+ n F 

subject to diag(y) = b 2 . 

First order algorithms for Weak PhaseLift will typically solve 

minimize d(diag(V), b 2 ) 
subject to V € H+ n F 

for some distance d over M n . If d is the / s -norm, for any s > 1, d(diag(V), b 2 ) = d s (V, Tib), where d s is 
the distance generated by the Schatten s-norm, the algorithm becomes 

minimize d s (V, Tib) A 
subject to V G H+ n F 

which is another projection problem in V. 

Thus, PhaseCut and Weak PhaseLift are comparable, in the sense that both algorithms aim at finding a 
point of n F n Tib but PhaseCut does so by picking a point of n Tit and moving towards F while 
Weak PhaseLift moves a point of n F towards T-L^. We can push the parallel between both relaxations 
much further. We will show in what follows that, in a very general case, PhaseLift and a modified version 
of PhaseCut are simultaneously tight. We will also be able to compare the stability of Weak PhaseLift and 
PhaseCut when measurements become noisy. 

4.3. Tightness of the Semidefinite Relaxation. We will now formulate a refinement of the semidefinite 
relaxation in PhaseCut and prove that this refinement is equivalent to the relaxation in PhaseLift under mild 
technical assumptions. Suppose u is the optimal phase vector, we know that the optimal solution to (1) 
can then be written x = A* diag(6)u, which corresponds to the matrix X = A* dia.g(b)uu* diag(6)^4^* 
in PhaseLift, hence 

Tr(X) = Tr(diag(6)^ t M t diag(6)uu*). 

Writing B = di&g(b)A^*A^ diag(fe), when problem (1) is solvable, we look for the "minimum trace" 
solution among all the optimal points of relaxation PhaseCut by solving 

SDP2(M) = min. Tr(BU) 

subject to Tr(MU) = (PhaseCutMod) 
diag(U) = 1,U>:0, 



which is a semidefinite program in U E H n . When problem (1) is solvable, then every optimal solution 
of the semidefinite relaxation PhaseCut is a feasible point of relaxation PhaseCutMod. In practice, the 
semidefinite program SDP(M + jB), written 

minimize Tr((M + jB)U) 
subject to diag(?7) = 1, U y 0, 

obtained by replacing M by M + 7.B in problem PhaseCut, will produce a solution to PhaseCutMod when- 
ever 7 > is sufficiently small. This means that all algorithms (greedy or SDP) designed to solve the 
original PhaseCut problem can be recycled to solve PhaseCutMod with negligible effect on complexity. We 
now show that the PhaseCutMod and PhaseLift relaxations are simultaneously tight when A is injective. 

Proposition 4.2. Assume that b{ 7^ for i = 1, . . . ,n, that A is injective and that there is a solution x 
to (I). The function 

$:H p -)> H„ 

X h-> $pQ = diag(6)- 1 AXA*diag(6)- 1 

is a bijection between the feasible points of PhaseCutMod and those of PhaseLift. 

Proof. Note that <I> is injective whenever b > and A has full rank. It remains to show that X is a feasible 
point of PhaseLift whenever 3>(X) is a feasible point of PhaseCutMod. We first show that 

Tr(MU) = 0, U h 0, (15) 

is equivalent to 

U = $(X) (16) 

for some X >z 0. Observe that Tr(UM) = means UM = because U, M >z 0, hence Tr(MU) = 
in (15) is equivalent to 

AA t diag(6)C/diag(6) = diag(&)[/diag(6) 

because 6 > and M = diag(6)(I - AA^) diag(6). This means 7^(diag(6)C7 diag(6)) C TZ{A) because 
AA^ is the orthogonal projection onto range(A). In fact, if we set X = A* diag(6)C7 diag(6)yl^*, this last 
equality implies 

AX = ^ t diag(6)C/diag(6)^ t * = diag^C/diagO)^ 1 "* 

and 

AX A* = diag(6)C/diag(6)^ t *^* = diag(6)[/diag(6) 

which is U = &(X), and shows (15) implies (16). Conversely, if U = $(X) then diag(b)C7 diag(6) = 
AX A* and using AA^A = A, we get AX A* = AA^AXA* = AA^ diag(6)C7 diag(6) which means 
MU = 0, hence (15) is in fact equivalent to (16) since U >z by construction. 

Now, if X is feasible for PhaseLift, we have shown Tr(M$(X)) = and 4>(X) >z 0, moreover 
diag($(X))j = Tr(aia*X)/bf = 1, so U = <f>(X) is a feasible point of PhaseCutMod. Conversely, 
if U is feasible for PhaseCutMod, we have shown that there exists X y such that U = &(X) which 
means diag(6)C/diag(6) = AX A*. We also have TV(aja*X) = bfUu = bf, which means X is feasible 
for PhaseLift and concludes the proof. ■ 

We now have the following central corollary showing the equivalence between PhaseCutMod and PhaseLift 
in the noiseless case. 

Corollary 4.3. If A is injective, 7^ Ofor all i = 1, n and if the reconstruction problem (1) admits an 
exact solution, then PhaseCutMod is tight (i.e. has a unique rank one solution) whenever PhaseLift is. 
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Proof. When A is injective, Tr(X) = Tr(B$(X)) and Rank(X) = Rank($(X)). ■ 



This last result shows that in the noiseless case, the relaxations PhaseLift and PhaseCutMod are in fact 
equivalent. In the same way, we could have shown that Weak PhaseLift and PhaseCut were equivalent. The 
performances of both algorithms may not match however when the information on b is noisy and perfect 
recovery is not possible. 

4.4. Stability in the Presence of Noise. We now consider the case where the vector of measurements b 
is of the form b = \Axq\ + 6 n oisc- We first introduce a definition of C-stability for PhaseCut and Weak 
PhaseLift. The main result of this section is that, when Weak PhaseLift is stable at a point, PhaseCut is 
stable too, with a constant of the same order. The converse does not seem to be true when b is sparse. 

Definition 4.4. Let xq G C n , C > 0. The algorithm PhaseCut (resp. Weak PhaseLift) is said to be C -stable 
at xq iff for all 6 n oise G W 1 close enough to zero, every minimizer V of equation (13) (resp. (14)) with 
b = | Axq I + 6 no i ae , satisfies 

\\V-(Ax )(Ax yh<C\\Ax \\ 2 \\ 

"noise 1 1 2 • 

The following matrix perturbation result motivates this definition, by showing that a C-stable algorithm 
generates a 0(C||& no ise||2)-error over the signal it reconstructs. 

Proposition 4.5. Let C > be arbitrary. We suppose that Axq ^ and \\V — (Axo)(Axq)*\\2 — 
CH^xol^ll&noiselb < II Axo \\ \j2. Let y be V's main eigenvector, normalized so that (Axo)*y = ||^4a?o II 2- 
Then 

\\y - Axq\\ = 0(C||6noise||2), 

and the constant in this last equation does not depend upon A, xq, C or \\b\\ 2 . 

Proof. We use [El Karoui and d'Aspremont, 2009, Eq.10] for 

Ax Q y V - {Ax )(Ax )* 
u = it~a — n - v = — n - E = n - ; — rn? 

||Aco||2 ll^0||2 ll^olll 

This result is based on [Kato, 1995, Eq. 3.29], which gives a precise asymptotic expansion of u — v. For our 
purposes here, we only need the first-order term. See also Bhatia [1997], Stewart and Sun [1990] or Stewart 
[2001] among others for a complete discussion. We get \\v — u\\ = 0(||-E||2) because if M = uu* , then 
Halloo = 1 in [El Karoui and d'Aspremont, 2009, Eq. 10]. This implies 

\\y - Ax \\ 2 = \\Ax,U\u -v\\ = ( W ~ ^){AxoYh \ = ( C \\b noise \\) 

V ll^olb / 

which is the desired result. ■ 

Note that normalizing y differently, we would obtain \\y — Axq\\2 < 4C||6 no i se ||2- We now show the main 
result of this section, according to which PhaseCut is "almost as stable as" Weak PhaseLift. In practice of 
course, the exact values of the stability constants has no importance, what matters is that they are of the 
same order. 

Theorem 4.6. Let A G C nxm ,for all x G C n , C > 0, if Weak PhaseLift is C-stable in x , then PhaseCut 
is (2C + 2\/2 + \)-stable in x . 

Proof. Let xq G C n , C > be such that Weak PhaseLift is C-stable in xq. Axq is a non-zero vector 
(because, with our definition, neither Weak PhaseLift nor PhaseCut may be stable in xq if Axq = and 
A / 0). We set D = 2C + 2^2 + 1 and suppose by contradiction that PhaseCut is not D-stable in xo. 
Let e > be arbitrary. Let 6 nj pc G M. n be such that ||6 n ,pc||2 < rnax(|| Aco||2, e /2) and such that, for 
b = \Axq\ + 6 ni pc, the minimizer Vpc of (13) verifies 

\\Vpc - (Axo)(Axo)*h > D\\Ax \\ 2 \\b nt p C \\ 2 
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Such a Vpc must exist or PhaseCut would be D-stable in xq. We call Vp C the restriction of Vpc to 
range(A) (that is, the matrix such that x*(Vp C )y = x*(Vpc)y if x,y G range(^4) and x*(Vp C )y = if x G 



range(A) or y G range(yl) ) and Vp C the restriction of Vpc to range(A) . Let us set 6 n ,PL = y Vj 



\Axq\u for i = 1, n. As Vp C G H+nJ 7 , F/ c minimizes (14) for 6 = |j4xo|+6 n ,PL (because Vp C G Tib). 
Lemmas A.l and A.2 (proven in the appendix) imply that \\Vp C — (Axo)(Axq)*\\2 > C||^4xo||2||^n,PL||2 
and 1 1 &n,F>L || 2 < e. As e is arbitrary, Weak PhaseLift is not C-stable in xq, which contradicts our hypotheses. 
Consequently, PhaseCut is (2C + 2\/2 + l)-stable in xq. ■ 

Theorem 4.6 is still true if we replace 2C + 2^2 + 1 by any D > 2C + y/2. We only have to 
replace, in the demonstration, the inequality ||6 nj pc||2 < ll^olb by II ^n,PC II 2 < "ll^olb with a = 
D — (2C + \/2)/(l + v2)- Also, the demonstration of this theorem is based on the fact that, when Vpc 
solves (13), one can construct some Vpl = Vp C close to Vpc, which is an approximate solution of (14). 
It is natural to wonder whether, conversely, from a solution Vpl of (14), one can construct an approx- 
imate solution Vpc of (13). It does not seem to be the case. One could for example imagine setting 
Vpc = di&g(R)VpL diag(i?), where Ri = b-i/ ' \JVplh- Then Vpc would not necessarily minimize (13) 
but a least belong to 'Kb- But ||Vpc — Vpl||2 might be quite large: (14) implies that || diag(Vpi) — fr 2 || s is 
small but, if some coefficients of b are very small, some Ri may still be huge, so diag(i?) 96 I. This does 
happen in practice (see § 5.5). 

4.5. Perturbation Results. We recall here sensitivity analysis results for semidefinite programming from 
Todd and Yildirim [2001]; Yildirim [2003], which produce explicit bounds on the impact of small pertur- 
bations in the observation vector b 2 on the solution V of the semidefinite program (11). Roughly speaking, 
these results show that if b 2 + b no i se remains in an explicit ellipsoid (called Dikin's ellipsoid), then interior 
point methods converge back to the solution in one full Newton step, hence the impact on V is linear, equal 
to the Newton step. These results are more numerical in nature than the stability bounds detailed in the pre- 
vious section, but they precisely quantify both the size and, perhaps more importantly, the geometry of the 
stability region. We assume that (V, Y, y) is a strictly feasible primal-dual point of the semidefinite program 
in (1 1), namely 



their symmetrized Kronecker product. As in Todd and Yildirim [2001]; Yildirim [2003], we define the 
following two operators 



where B G H n is a suitable matrix, varying according to the search direction (we can pick B = I). We also 
define the linear operator W such that 

Wx = diag(£ -1 J"diag(x)), x G M n , 
Todd and Yildirim [2001]; Yildirim [2003] then show the following result. 
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(P Q)X = -{PXQ* + QXP*), 



£ = {{l-AA ] ) -diag(y)) QB and T = BV I 



Proposition 4.7. Assume that (V, Y, y) is a primal dual strictly feasible point of the semidefinite program 
in (11). Suppose also that the equality constraints of problem (11) are perturbed by a noise component 
b noise to become b 2 + b noise , with b noise satisfying 



V -V 2 £- 1 Tdu^(W- 1 b n0 i a e)V- 1 / 2 
Y- 1 / 2 diag(^- 1 6„ oise )y- 1 / 2 



OC 



< i (IV) 
< 1 (18) 



J noise ) 



then the matrices 

V = V + «S- 1 J-diag(iy- 1 & noise ) Y = Y- d\^{W-\. 
form a strictly feasible pair for the perturbed program, with a duality gap lower than Tr(VY). 

Proof. See [Todd and Yildirim, 2001, Prop. 3.1]. Note that since the operator A(U) in this reference is 
simply diag(fy) in our case, the surjectivity condition is trivially satisfied. ■ 

Of course, optimal solutions of problem (11) tend to have low rank, hence are not strictly feasible. How- 
ever, interior point solvers actually produce a full "central path" of solutions parameterized by \i > 0, where 
Tr(VY) = [i, where n > controls the duality gap and VY = fjl on that path [Boyd and Vandenberghe, 
2004, Chap. 11]. This means that these solvers naturally produce approximate solution that are strictly 
feasible primal dual pairs, for which this perturbation result applies. 

4.6. Complexity Comparisons. Both the relaxation in PhaseLift and that in PhaseCut are semidefinite 
programs and we highlight below the relative complexity of solving these problems depending on algorith- 
mic choices and precision targets. Note that, in their numerical experiments, [Candes et al., 2011b] solve a 
penalized formulation of problem PhaseLift, written 

n 

min ^(TriaiaZX) - b 2 ) 2 + A Tr(X) (19) 

~ i=l 

in the variable X £ H p , for various values of the penalty parameter A > 0. 

The trace norm promotes a low rank solution, and solving a sequence of weighted trace-norm problems 
has been shown to further reduce the rank in [Fazel et al., 2003; Candes et al., 201 lb]. This method replaces 
Tr(X) by Tr(WfcX) where Wo is initialized to the identity /. Given a solution X^ of the resulting semidef- 
inite program, the weighted matrix is updated to Wk+i = (X^ + rjl)^ 1 (see Fazel et al. [2003] for details). 
We denote by K the total number of such iterations, typically of the order of 10. Trace minimization is 
not needed for the semidefinite program (PhaseCut), where the trace is fixed because we optimize over a 
normalized phase vector. However, weighted trace-norm iterations could potentially improve performance 
in PhaseCut as well. 

Recall that p is the size of the signal and n is the number of measured samples with n = Jp in the 
examples reviewed in Section 5. In the numerical experiments in [Candes et al., 2011b] as well as in this 
paper, J = 3, 4, 5. The complexity of solving the PhaseCut and PhaseLift relaxations in PhaseLift using 
generic semidefinite programming solvers such as SDPT3 [Toh et al., 1999], without exploiting structure, is 
given by 

O ( J 4 - 5 p 4 - 5 log and O J 2 p 4 - 5 log - 

for PhaseCut and PhaseLift respectively [Ben-Tal and Nemirovski, 2001, § 6.6.3]. The fact that the constraint 
matrices have only one nonzero coefficient in PhaseCut can be exploited (the fact that the constraints a^a* 
are rank one in PhaseLift bhelps, but it does not modify the principal complexity term) so we get 

O ( J 3 'V- 5 log-^) and O ( K J 2 p 4 - 5 log - 



e , 
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for PhaseCut and PhaseLift respectively using the algorithm in Helmberg et al. [1996] for example. If we 
use first-order solvers such as TFOCS [Becker et al., 2012], based on the optimal algorithm in [Nesterov, 
1983], the dependence on the dimension can be further reduced, to become 

(^!) and o(^f) 

for solving a penalized version of the PhaseCut relaxation and the penalized formulation of PhaseLift in (19). 
While the dependence on the signal dimensions p is somewhat reduced, the dependence on the target pre- 
cision grows from log(l/e) to 1/e. Finally, the iteration complexity of the block coordinate descent Algo- 
rithm 3 is substantially lower and its convergence is linear, but no fully explicit bounds on the number of 
iterations are known in our case. The complexity of the method is then bounded by 

O (log ~) 

but the constant in this bound depends on n here, and the dependence cannot be quantified explicitly. 

Algorithmic choices are ultimately guided by precision targets. If e is large enough so that a first-order 
solver or a block coordinate descent can be used, the complexity of PhaseCut is not significantly better 
than that of PhaseLift. On the contrary, when e is small, we must use an interior point solver, for which 
PhaseCut's complexity is an order of magnitude lower than that of PhaseLift because its constraint matrices 
are singletons. In practice, the target value for e strongly depends on the sampling matrix A. For example, 
when A corresponds to the convolution by 6 Gaussian random filters (§5.2), to reconstruct a Gaussian white 
noise of size 64 with a relative precision of rj, we typically need e ~ 2.10~ 1 ^. For 4 Cauchy wavelets (§5.3), 
it is twenty times less, with e ~ 10~ 2 ??. For other types of signals than Gaussian white noise, we may even 
need e ~ 10~ 3 rj. 

4.7. Greedy Refinement. If the PhaseCut or PhaseLift algorithms do not return a rank one matrix then an 
approximate solution of the phase recovery problem is obtained by extracting a leading eigenvector v. For 
PhaseCut and PhaseLift, x = A* diag(6)-u and x = v are respectively approximate solutions of the phase 
recovery problem with \Ax\ / b = \Ax\. This solution is then refined by applying the Gerchberg-Saxton 
algorithm initialized with x. If x is sufficiently close to x then, according to numerical experiments of 
Section 5, this greedy algorithm converges to Ax with |A| = 1. These greedy iterations require much 
less operations than PhaseCut and PhaseLift algorithms, and thus have no significant contribution to the 
computational complexity. 

4.8. Sparsity. Minimizing Tr(X) in the PhaseLift problem means looking for signals which match the 
modulus constraints and have minimum £2 norm. In some cases, we have a priori knowledge that the signal 
we are trying to reconstruct is sparse, i.e. Card(x) is small. The effect of imposing sparsity was studied in 
e.g. [Moravec et al., 2007; Osherovich et al., 2012; Li and Voroninski, 2012]. 

In that scenario, we typically have n < p, hence the set of solutions to \\Ax — diag(6)u||2 is written 
x = A^ diag(6)u + Fv where F is basis for the nullspace of A. The reconstruction problem with a £\ 
penalty promoting sparsity is then written 

minimize \\AA^ diag(6)u — diag(6)u||2 + 7||-<4J diag(fc)u + Fv\\\ 
subject to \v,i\ = 1, 

in the variables u G C p and y 6 C p_n . Using the fact that \\y\\i = \\yy* \\e 1 , this can be relaxed as 

minimize Tr(UM 3 ) + -y\\VUV*\\ ei 
subject to U >z 0, \Uu\ = 1, i = 1, . . . ,n, 

which is a semidefinite program in the (larger) matrix variable U € H p , with M3 the matrix containing M 
in its upper left block and zeros elsewhere, and V = (A^ diag(6), F). 
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(a) (b) (c) 

FIGURE 2. Real parts of sample test signals, (a) Gaussian white noise, (b) Sum of 6 
sinuoids of random frequency and random amplitudes, (c) Scan-line of an image. 



5. Numerical results 

In this section, we compare the numerical performance of the Gerchberg-Saxton (greedy), PhaseCut and 
PhaseLift algorithms on various phase recovery problems. As in [Candes et al., 2011b], the PhaseLift prob- 
lem is solved using the package in [Becker et al., 2012], with reweighting, using K = 10 outer iterations and 
1000 iterations of the first order algorithm. The PhaseCut and Gerchberg-Saxton algorithms described here 
are implemented in a public software package 1 . These phase recovery algorithms computes an approximate 
solution x from \Ax\ and the reconstruction error is measured by the relative Euclidean distance up to a 
complex phase given by 

e(x,x) = min - — - — J^— . (20) 

cgC,|c|=i ||x|| 

We also record the error over measured amplitudes, written 

«(| MW )4 ll^a . (2!) 

Note that when the phase recovery problem either does not admit a unique solution or is unstable, we usu- 
ally have e(|Ac|, \Ax\) <C e(x,x). In the next three subsections, we study these reconstruction errors for 
three different phase recovery problems, where A is defined as an oversampled Fourier transform, as mul- 
tiple filterings with random filters, or as a wavelet transform. Numerical results are computed on three 
different types of test signals x: realizations of a complex Gaussian white noise, sums of complex exponen- 
tials e lujm with random frequencies to and random amplitudes a w (the number of exponentials is random, 
around 6), and signals whose real and imaginary parts are scan-lines of natural images. Each signal has 
p = 128 coefficients. Figure 2 shows the real part of sample signals, for each signal type. 

5.1. Oversampled Fourier Transform. The discrete Fourier transform y of a signal y of q coefficients is 
written 

E —i2irkm 
y m exp( ) . 
Q 

In X-ray crystallography or diffraction imaging experiments, compactly supported signals are estimated 
from the amplitude of Fourier transforms oversampled by a factor J > 2. The corresponding operator A 
computes an oversampled discrete Fourier transform evaluated over n = Jp coefficients. The signal x of 



Available at http : / /www . cmap . poly technique . f r / scattering/ code/phaserecovery .zip. 
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Fourier 


Random Filters 


Wavelets 


Gerchberg-S axton 


5% 


49% 


0% 


PhaseLift with reweighting 


3% 


100% 


62% 


PhaseCut 


4% 


100% 


100% 



Table 1. Percentage of perfect reconstruction from \Ax\, over 300 test signals, for the 
three different operators A (columns) and the three algorithms (rows). 





Fourier 


Random Filters 


Wavelets 


Gerchberg-S axton 


0.9 


1.2 


1.3 


PhaseLift with reweighting 


0.8 


exact 


0.5 


PhaseCut 


0.8 


exact 


exact 



Table 2. Average relative signal reconstruction error e(x, x) over all test signals that are 
not perfectly reconstructed, for each operator A and each algorithm. 



size p is extended into x by adding (J — l)p zeros and 

{Ax)k =x k = 2_^x m exp( — ). 

m=l 

For this oversampled Fourier transform, the phase recovery problem does not have a unique solution [Aku- 
towicz, 1956]. In fact, one can show [Sanz, 1985] that there are as many as 2 P ~ 1 solutions x G C p such 
that \Ax\ = \Ax\. Moreover, increasing the oversampling factor J beyond 2 does not reduce the number of 
solutions. 

Because of this inuinsic instability, we will observe that all algorithms perform similarly on this type 
of reconstruction problems and Table 1 shows that the percentage of perfect reconstruction is below 5% 
for all methods. The signals which are perfectly recovered are sums of few sinusoids. Because these test 
signals are very sparse in the Fourier domain, the number of signals having identical Fourier coefficient 
amplitudes is considerably smaller than in typical sample signals. As a consequence, there is a small prob- 
ability (about 5%) of exactly reconstructing the original signal given an arbitrary initialization. None of 
the Gaussian random noises and image scan lines are exactly recovered. Note that we say that an exact 
reconstruction is reached when e(x, x) < 10~ 2 because a few iterations of the Gerchberg-Saxton algorithm 
from such an approximate solution x will typically converges to x. Numerical results are computed with 
100 sample signals in each of the 3 signal classes. 

Table 2 gives the average relative error e(x, x) over signals that are not perfectly reconstructed, which 
is of order one here. Despite this large error, Table 3 shows that the relative error e(|Ac|, \Ax\) over the 
Fourier modulus coefficients is below 10~ 3 for all algorithms. This is due to the non-uniqueness of the 
phase recovery from Fourier modulus coefficients. Recovering a solution x with identical or nearly identical 
oversampled Fourier modulus coefficients as x does not guarantee that x, is proportional to x. Overall, in this 
set of ill-posed Fourier experiments, recovery performance is very poor for all methods and the PhaseCut 
and PhaseLift relaxations do not improve much on the results of the faster Gerchberg-Saxton algorithm. 

5.2. Multiple Random Illumination Filters. To guarantee uniqueness of the phase recovery problem, one 
can add independent measurements by "illuminating" the object through J filters h? in the context of X-ray 
imaging or crystallography [Candes et al., 201 la]. The resulting operator A is the discrete Fourier transform 
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PhaseCut 
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exact 
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Table 3. Average relative error e(|^4x|, \Ax\) of coefficient amplitudes, over all test sig- 
nals that are not perfectly reconstructed, for each operator A and each algorithm. 



of x multiplied by each filter h? of size p 

(Ax)k+ P j = (xhJ) k = (x*hP)k for 1 < j < J and < k < p, 

where x * b? is the circular convolution between x and hP. Candes et al. [2011a] proved that, for some 
constant C > large enough, Cp Gaussian independent measurements are sufficient to perfectly reconstruct 
a signal of size p, with high probability. Similarly, we would expect that, picking the filters h? as realizations 
of independent Gaussian random variables, perfect reconstruction will be guaranteed with high probability 
if J is large enough (and independent of p). This result has not yet been proven because Gaussian filters do 
not give independent measurements but Candes et al. [2011b] observed that, empirically, for signals of size 
p = 128, with J = 4 filters, perfect recovery is achieved in 100% of experiments. 

Table 1 confirms this behavior and shows that the PhaseCut algorithm achieves perfect recovery in all 
our experiments. As predicted by the equivalence results presented in the previous section, we observe that 
PhaseCut and PhaseLift have identical performance in these experiments. With 4 filters, the solutions of 
these two SDP relaxations are not of rank one but are "almost" of rank one, in the sense that their first 
eigenvector v has an eigenvalue much larger than the others, by a factor of about 5 to 10. Numerically, we 
observe that the corresponding approximate solutions, x, = diag(w)6, yield a relative error e(| Ae|, \Ax\) 
which, for scan-lines of images and especially for Gaussian signals, is of the order of the ratio between the 
largest and the second largest eigenvalue of the matrix U. The resulting solutions x are then sufficiently 
close to x so that a few iterations of the Gerchberg-Saxton algorithm started at x will converge to x. 

Table 1 shows however that directly applying the Gerchberg-Saxton algorithm starting from a random 
initialization point yields perfect recovery in only about 50% of our experiments. This percentage decreases 
as the signal size p increases. The average error e(x, x) on non-recovered signals in Table 2 is 1.3 whereas 
on the average error on the modulus e(|Ac|, \Ax\) is 0.2. 

5.3. Wavelet Transform. Phase recovery problems from the modulus of wavelet coefficients appear in 
audio signal processing where this modulus is used by many audio and speech recognition systems. These 
moduli also provide physiological models of cochlear signals in the ear [Chi et al., 2005] and recovering 
audio signals from wavelet modulus coefficients is an important problem in this context. 

To simplify experiments, we consider wavelets dilated by dyadic factors 2 J which have a lower frequency 
resolution than audio wavelets. A discrete wavelet transform is computed by circular convolutions with 
discrete wavelet filters, i.e. 

p 

(Ax) k+jp = (x * ^ r ) k = ^2 x m^{- m for 1 < 3 < J ~ 1 and 1 < k < p 

m=l 

where ipm is a p periodic wavelet filter. It is defined by dilating, sampling and periodizing a complex wavelet 
i/> £ L 2 (C), with 

oo 

ip J m = ip(2?(Tn/p — k)) for 1 < m < p. 

fc=-oo 20 



Numerical computations are performed with a Cauchy wavelet whose Fourier transform is 

i>(oo) = oj d e-" l w>0 , 

up to a scaling factor, with d = 5. To guarantee that A is an invertible operator, the lowest signal frequencies 
are carried by a suitable low-pass filter <p and 

(Ax) k+ jp = (x* 4>) k for 1 < k < p. 

One can prove that x is always uniquely determined by \Ax\, up to a multiplication factor. When x is real, 
the reconstruction appears to be numerically stable. Recall that the results of §3.6.4 allow us to explicitly 
impose the condition that x is real in the PhaseCut recovery algorithm. For PhaseLift in Candes et al. 
[2011b], this condition is enforced by imposing that X = xx* is real. For the Gerchberg-Saxton algorithm, 
when x is real, we simply project at each iteration on the image of W° by A, instead of projecting on the 
image of C p by A. 

Numerical experiments are performed on the real part of the complex test signals. Table 1 shows that 
Gerchberg-Saxton does not reconstruct exactly any real test signal from the modulus of its wavelet coeffi- 
cients. The average relative error e(x, x) in Table 2 is 1.2 where the coefficient amplitudes have an average 
error e(| \Ax\) of 0.3 in Table 3. 

PhaseLift reconstructs 62% of test signals, but the reconstruction rate varies with signal type. The pro- 
portions of exactly reconstructed signals among random noises, sums of sinusoids and image scan-lines are 
27%, 60% and 99% respectively. Indeed, image scan-lines have a large proportion of wavelet coefficients 
whose amplitudes are negligible. The proportion of phase coefficients having a strong impact on the recon- 
struction of x is thus much smaller for scan-line images than for random noises, which reduces the number 
of significant variables to recover. Sums of sinuoids of random frequency have wavelet coefficients whose 
sparsity is intermediate between image scan-lines and Gaussian white noises, which explains the interme- 
diate performance of PhaseLift on these signals. The overall average error e(x, x) on non-reconstructed 
signals is 0.5. Despite this relatively important error, x and x are usually almost equal on most of their 
support, up to a sign switch, and the importance of the error is precisely due to the local phase inversions 
(which change signs). 

The PhaseCut algorithm reconstructs exactly all test signals. Moreover, the recovered matrix U is al- 
ways of rank one and it is therefore not necessary to refine the solution with Gerchberg-Saxton iterations. 
At first sight, this difference in performance between PhaseCut and PhaseLift may seem to contradict the 
equivalence results of §4.3 (which are still valid when we take advantage of the fact that x is real). It can be 
explained however by the fact that 10 steps of reweighing and 1000 inner iterations per step are not enough 
to let PhaseLift fully converge. In these experiments, the precision required to get perfect reconstruction is 
very high and, consequently, the number of first-order iterations required to achieve it is too large (see §4.6). 
With an interior-point-solver, this number would be much smaller but the time required per iteration would 
become prohibitively large. The much simpler structure of the PhaseCut relaxation allows us to solve these 
larger problems more efficiently. 

5.4. Impact of Trace Minimization. We saw in §4.1 that, in the absence of noise, PhaseCut was very 
similar to a simplified version of PhaseLift, Weak PhaseLift, in which no trace minimization is performed. 
Here, we confirm empirically that Weak PhaseLift and PhaseLift are essentially equivalent. Minimizing the 
trace is usually used as rank minimization heuristic, with recovery guarantees in certain settings [Fazel et al., 
2003; Candes and Recht, 2008; Chandrasekaran et al., 2010] but it does not seem to make much difference 
here. In fact, Demanet and Hand [2012] recently showed that in the independent experiments setting, Weak 
PhaseLift has a unique (rank one) solution with high probability, i.e. the feasible set of PhaseLift is a 
singleton and trace minimization has no impact. Of course, from a numerical point of view, solving the 
feasibility problem Weak PhaseLift is about as hard as solving the trace minimization problem PhaseLift, so 
the result [Demanet and Hand, 2012] simplifies analysis but does not really affect numerical performance. 
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Number of measurements Number of measurements 

FIGURE 3. Comparison of PhaseLift and Weak PhaseLift performance, for 64-sized sig- 
nals, as a function of the number of measurements. Reconstruction rate, after Gerch- 
berg-Saxton iterations (left) and proportion of rank one solutions (right). 



Figure 3 compares the performances of PhaseLift and Weak PhaseLift as a function of n (the number 
of measurements). We plot the percentage of successful reconstructions (left) and the percentage of cases 
where the relaxation was exact, i.e. the reconstructed matrix X was rank one (right). The plot shows a 
clear phase transitions when the number of measurements increases. For PhaseLift, these transitions happen 
respectively at n = 155 « 2.5p and n = 285 « A.5p, while for Weak PhaseLift, the values become 
n = 170 xs 2.7p and n = 295 4.6p, so the transition thresholds are very similar. Note that, in the absence 
of noise, Weak PhaseLift and PhaseCut have the same solutions, up to a linear transformation (see §4.2), so 
we can expect the same behavior in the comparison PhaseCut versus PhaseCutMod. 

5.5. Reconstruction in the Presence of Noise. Numerical stability is crucial for practical applications. In 
this last subsection, we suppose that the vector b of measurements is of the form 

b = | Arc | + 6 no i se 

with ||6 no i se ||2 = o(|| Ar||2). In our experiments, fo n oise is always a Gaussian white noise. Two reasons can 
explain numerical instabilities in the solution x. First, the reconstruction problem itself can be unstable, 
with \\x — cx || >> Hl-Axl — \Ax\\\ for all c G C. Second, the algorithm may fail to reconstruct x such 
that \\\Ax\ — 6|| « 1 1 fenoise 1 1 - No algorithm can overcome the first cause but good reconstruction methods 
will overcome the second one. In the following paragraphs, to complement the results in §4.4, we will 
demonstrate empirically that PhaseCut is stable, and compare its performances with PhaseLift. We will 
observe in particular that PhaseCut appears to be more stable than PhaseLift when b is sparse. 

5.5.1. Wavelet transform. Figure 4 displays the performance of PhaseCut in the wavelet transform case. It 
shows that PhaseCut is stable up to around 5 — 10% of noise. Indeed, the reconstructed x usually satisfies 
e(\Ax\, \Ax\) = || |Ar|-|,4x| || 2 < || frnoise 1 1 2> which is the best we can hope for. Wavelet transform is a case 
where the underlying phase retrieval problem may present instabilities, therefore the reconstruction error 
e(x, x) is sometimes much larger than e(| Ac|, | Ar|). This remark applies especially to sums of sinusoids, 
which represent the most unstable case. 

When all coefficients of Ax have approximately the same amplitude, PhaseLift and PhaseCut produce 
similar results, but when Ax is sparse, PhaseLift appears less stable. We gave a qualitative explanation of 
this behavior at the end of §4.4 which seems to be confirmed by the results in Figure 4. This boils down to 
the fact that the values of the phase variables in PhaseCut corresponding to zeros in b can be set to zero so 
the problem becomes much smaller. Indeed, the performance of PhaseLift and PhaseCut are equivalent in 
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Figure 4. Mean reconstruction errors versus amount of noise for PhaseLift and PhaseCut, 
both in decimal logarithmic scale, for three types of signals: Gaussian white noises, sums 
of sinusoids and scan-lines of images. Both algorithms were followed by a few hundred 
Gerchberg-Saxton iterations. 



the case of Gaussian random filters (where measurements are never sparse), they are a bit worse in the case 
of sinusoids (where measurements are sometimes sparse) and quite unsatisfactory for scan-lines of images 
(where measurements are always sparse). 

5.5.2. Multiple random illumination filters. Candes and Li [2012] proved that, if A was a Gaussian matrix, 
the reconstruction problem was stable with high probability, and PhaseLift reconstructed a x such that 

£( ^°(w)- 

The same result seems to hold for A corresponding to Gaussian random illumination filters (cf. §5.2). 
Moreover, PhaseCut is as stable as PhaseLift. Actually, up to 20% of noise, when followed by some Ger- 
chberg-Saxton iterations, PhaseCut and PhaseLift almost always reconstruct the same function. Figure 5 
displays the corresponding empirical performance, confirming that both algorithms are stable. The relative 
reconstruction errors are approximately linear in the amount of noise, with 

e(|Ar|, \Ax\) « 0.8 x ^ nois ^ 2 and e(x,x) « 2 x ^"° 1S ^ 2 
|L4a;||2 ir , \\Ax\\2 
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FIGURE 5. Mean performances of PhaseLift and PhaseCut, followed by Gerchberg-Saxton 
iterations, for four Gaussian random illumination filters. The x-axis represents the relative 
noise level, H&noiselb/H^lh and the y-axis the relative error on the result, which is either 

e(x, x) or e(|A5|, \Ax\). 
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Figure 6. Loglog plot of the relative error over the matrix reconstructed by PhaseLift 
(resp. PhaseCut) when A represents the convolution by five Gaussian filters. Black curves 
correspond to Ax non-sparse, red ones to sparse Ax. 



in our experiments. 

The impact of the sparsity of b discussed in the last paragraph may seem irrelevant here: if A and x are 
independently chosen, Ax is never sparse. However, if we do not choose A and x independently, we may 
achieve partial sparsity. We performed tests for the case of five Gaussian random filters, where we chose 
x G C 64 such that (Ax)k = for k < 60. This choice has no particular physical interpretation but it allows 
us to check that the influence of sparsity in \Ax\ over PhaseLift is not specific to the wavelet transform. 
Figure 6 displays the relative error over the reconstructed matrix in the sparse and non-sparse cases. If we 
denote by X p \ £ C pxp (resp. X pc G C nxn ) the matrix reconstructed by PhaseLift (resp. PhaseCut), this 
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relative error is defined by 



\AX pl A* - (Ax)(Ax)*\\ 2 

(for PhaseLift) 



\\(Ax)(Ax)*\\ 2 

(Ax)(Ax)*\\n 

(for PhaseCut) 



diag(6)X pc diag(6) - (Ax) (Ax 



\\(Ax)(Ax) 



In the non-sparse case, both algorithms yield very similar error e 7||6 no i se ||2/||^4x||2 (the difference 
for a relative noise of 10 -4 may come from a computational artifact). In the sparse case, there are less 
phases to reconstruct, because we do not need to reconstruct the phase of null measurements. Consequently, 
the problem is better constrained and we expect the algorithms to be more stable. Indeed, the relative 
errors over the reconstructed matrices are smaller. However, in this case, the performance of PhaseLift 
and PhaseC ut do not match anymore : e 3 1 1 6 n oise 1 1 2 / 1 1 Ax \ | 2 for PhaseLift and e pa 1 . 2 1 1 6 no i S e 1 1 2 / 1 1 Ax \ | 2 
for PhaseCut. This remark has no practical impact in our particular example here because taking a few 
Gerchberg-Saxton iterations would likely make both methods converge towards the same solution, but it 
confirms the importance of accounting for the sparsity of \Ax\. 
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Appendix A. Technical lemmas 
In this brief appendix, we prove the two technical lemmas used in the proof of Theorem 4.6. 



Lemma A.l. Under the assumptions and notations of Theorem 4.6, we have 



V£ c - (Ax )(Ax )*h > 2C||Ax ||2|| On.PC||2 



Proof. We first give an upper bound of \\Vpc — VpcWz- We use the Cauchy-Schwarz inequality : for 
every positive matrix X and all x, y, \x*Xy\ < Vx*Xx^y*Xy. Let {/«} be an hermitian base of range (^4) 
diagonalizing Vp C and j^} an hermitian base of range(A)- 1 diagonalizing Vp C . As {/j} n {g{\ is an 
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hermitian base of C n , we have 

\\Vpc - Vl c \\l = Y)K{Vpc ~ vfc)M 2 + £|/*(Vpc - Vf c ) 9l \ 2 

i,j 

+ Y.\9^ v pc - Vi c )fi\ 2 + Ebi(Vpc - Vlc)9j>\ 2 

2 



= 2j2\f*(V P c)g 3 \ 2 + Y;\9;(Vpc)g: 
< 2Y t \f?(y P c)fi\\g3(Vpc)g j \ + (Yjt(Ypc)9-. 
= 2 Tr Vl ^ V pc + (Tr V^f 



< [V2 V Tr vL J Tr + Tr V^ c (22) 



2 



PC\I ' ' V PC ' ' V PC 

Let us now bound Tr V£ c . We first note that Tr V^ c = Tr((I - AA^)V PC (I - AA*)) = Tr(Vp C (I - 
AA^)) = di(Vpc, -T 7 ) (according to lemma 4.1). Let u E C n be such that, for all i, \m\ = 1 and (Axo)% = 
Ui\Axo\i. We set b = \Axo\ + 6 nj pc and V = (b x u)(b x it)*. AsVeH+n% and Vpc minimizes (13), 

Tr Vp- C = d^VpcT) < dt(V,T) 

= di((Ax + b n p C u)(Ax + b^pcu)*,^) 

= ^((fr^PC^X&n.PC^)*,-? 7 ) 
< \\(bn,PCU)(bn,PCu)*\\i 

= Tr(b n p c u)(bn,pcu)* = ||&n,Pc||| (23) 

We also have Tr v/ c = Tr V PC - Tr V^. This equality comes from the fact that, if {/;} is an hermitian 
base of range(A) and {gi} an hermitian base of range(^4)- L , then 

Tr Vpc = J>Vrotf + YjiVpcgt 

i i 

= Yl^ V icfi + Y.^ V PC9i = Tr Vl c + Tr V PC 

i i 

As Vp^c t 0, Tr Vf c < Tr V PC = \\\Ax \ + &h,pc||| and > bv combining this with relations (22) and (23), 
we get 

1 1 Vpc ~ Vpch - ^lll^ol + ^pclbll^pclb + ||&n,pc||l 

< \/2||Axo||2||&n,Pc||2 + (l + V^)|| °n,PC||2 

And, by reminding that we assumed ||6 n pclla — \\Axq\\2, 

\\vf c - (Ax )(Ax y\\ 2 > \\Vpc - (Ax )(Ax )*\\ 2 - Wv'c - Vpch 

> D\\Axo\\ 2 \\bn,?ch ~ \/2||Ax ||2||bn,Pc||2 ~ (1 + >/2)||6n ( PG||l 

> (£)-2>/2-l)||Aro||2||6n,PG||2=2C||Aro||2||6n,Pc||2 

which concludes the proof. ■ 

Lemma A. 2. Under the assumptions and notations of Theorem 4.6, we have 

||&n,Plj2 < 2||6 n ,Pc|| 
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Proof. Let ej be the i-th vector of C n 's canonical base. We set a = fi + g% where /, G range (A) and 
gt G range (A)- 1 . 

Vpcii = e*V PC ei 

= ttvlcfi + 2Re(f*V PC gi) + g\V^cm 
= vl Cll + 2Re(f:V PC g i ) + V P L Cii 



Because \f*V PC gi\ < y/ftVpcfiy/gfVwfi = JV» Cii Jv£ Cii , 



(\/Vpcii ~ JVpca) 2 < Vpcu < Nv' + \ Vir Cll ? 



Hen - ^JVpcii < VVpcTi < iHca + V^c 



So 



|&n,PL,i| = \yV" Cii ~ \AX Q 



< \V v icu ~ VVpcul + \VVpcu-\Ax \ 



< V V PCii + b n,PC,i 



b-a^hh < || s yVpcu f II 2 + lrn,pc||2 



and, by (23), 



'Tr V£ c + ||6n,P C ||2 < 2||6n,pc||2 
which concludes the proof. ■ 
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